Spis treści

  1. Wczytywanie danych
  2. Przetwarzanie danych
  3. Korelacja
    1. Korelacja wybranych parametrów
    2. Macierz korelacji

Ładowanie bibliotek

library(dplyr)
library(ggplot2) 
library('corrplot')
library(caret)
library(plotly)

Wczytywanie danych z pliku

all_data <- read.csv(file="elektrownie.csv", header=TRUE, sep=",")

Kod przetwarzający dane

Zmiana nazw kolumn

needed_data <- all_data %>% select(-icon, -(tempi:irri_pvgis_mod))
colnames(needed_data) <- c("measurementId", "place", "model", "brand", "latitude","longitude", "age", "year", "day", "hour", "date", "temperature", "radiation", "pressure", "windspeed", "humidity", "dewpoint", "bearing", "cloudcover", "energy")
needed_data$only_day <- as.numeric(format(as.POSIXct(factor(needed_data$date),format="%m/%d/%Y %H:%M"),"%d"))
needed_data$only_month <- as.numeric(format(as.POSIXct(factor(needed_data$date),format="%m/%d/%Y %H:%M"),"%m"))
needed_data$only_hour <- as.numeric(format(as.POSIXct(factor(needed_data$date),format="%m/%d/%Y %H:%M"),"%H"))
needed_data$place_string <- paste("place: ", as.character(needed_data$place))

Uzupełnienie zerowych wartości

Wyodrębnienie modeli, marek oraz umiejscowienia czujników

W zbiorze występują dane dla 17 różnych lokalizacji. Lokalizację okreś

idbrands_with_models <- unique(needed_data[c("brand", "model")])
idmodels <- unique(needed_data$model)
idbrands <- unique(needed_data$brand)
idplaces <- unique(needed_data$place)
gps <- unique(needed_data[c("latitude", "longitude")])
gps_with_places_id <- unique(needed_data[c("latitude", "longitude", "place")])

Podstawowe statystyki

measurementId place model brand latitude longitude age year day hour date temperature radiation pressure windspeed humidity dewpoint bearing cloudcover energy only_day only_month only_hour place_string
Min. : 1 Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.4150 Min. :0.1540 Min. :0.0000 Min. :2012 Min. :0.0000 Min. :0.000 10/10/2012 10:00: 17 Min. :0.0450 Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.1600 Min. :0.1390 Min. :0.0000 Min. :0.000 Min. :0.0000 Min. : 1.00 Min. : 1.000 Min. : 1 Length:235790
1st Qu.: 99646 1st Qu.:0.1000 1st Qu.:0.1670 1st Qu.:0.0830 1st Qu.:0.4370 1st Qu.:0.6200 1st Qu.:0.0000 1st Qu.:2012 1st Qu.:0.2520 1st Qu.:0.222 10/10/2012 11:00: 17 1st Qu.:0.2120 1st Qu.:0.0000 1st Qu.:0.7480 1st Qu.:0.04200 1st Qu.:0.5400 1st Qu.:0.5350 1st Qu.:0.3000 1st Qu.:0.230 1st Qu.:0.0000 1st Qu.: 8.00 1st Qu.: 4.000 1st Qu.: 6 Class :character
Median :158594 Median :0.2250 Median :0.2080 Median :0.1670 Median :0.4370 Median :0.6240 Median :0.1250 Median :2012 Median :0.4770 Median :0.500 10/10/2012 12:00: 17 Median :0.3480 Median :0.0350 Median :0.7530 Median :0.06600 Median :0.7000 Median :0.6190 Median :0.4780 Median :0.310 Median :0.0490 Median :16.00 Median : 7.000 Median :11 Mode :character
Mean :152703 Mean :0.2147 Mean :0.2426 Mean :0.1519 Mean :0.4495 Mean :0.5711 Mean :0.3145 Mean :2012 Mean :0.4812 Mean :0.500 10/10/2012 13:00: 17 Mean :0.3734 Mean :0.1091 Mean :0.6504 Mean :0.07622 Mean :0.6844 Mean :0.6055 Mean :0.4512 Mean :0.359 Mean :0.1688 Mean :15.76 Mean : 6.527 Mean :11 NA
3rd Qu.:217541 3rd Qu.:0.3250 3rd Qu.:0.2920 3rd Qu.:0.1670 3rd Qu.:0.4390 3rd Qu.:0.6300 3rd Qu.:0.7190 3rd Qu.:2013 3rd Qu.:0.7100 3rd Qu.:0.778 10/10/2012 14:00: 17 3rd Qu.:0.5300 3rd Qu.:0.2040 3rd Qu.:0.7550 3rd Qu.:0.10200 3rd Qu.:0.8400 3rd Qu.:0.6830 3rd Qu.:0.6600 3rd Qu.:0.510 3rd Qu.:0.3320 3rd Qu.:23.00 3rd Qu.:10.000 3rd Qu.:16 NA
Max. :276488 Max. :0.4250 Max. :0.7500 Max. :0.4170 Max. :0.5530 Max. :0.6910 Max. :1.0000 Max. :2013 Max. :1.0000 Max. :1.000 10/10/2012 15:00: 17 Max. :0.8180 Max. :0.7100 Max. :0.7690 Max. :0.69600 Max. :1.0000 Max. :0.8650 Max. :0.7690 Max. :1.000 Max. :1.0000 Max. :31.00 Max. :12.000 Max. :20 NA
NA NA NA NA NA NA NA NA NA NA (Other) :235688 NA NA NA NA NA NA NA NA NA NA NA NA NA

Korelacja

Macierz korelacji wszystkich parametrów

Korelacja wybranych parametrów